prof_info <- tibble(
professor = c("Bodwin",
"Glanz",
"Robinson",
"Theobold"),
undergrad_school = c("Harvard",
"Cal Poly",
"Winona State University",
"Colorado Mesa University"),
grad_school = c("UNC",
"Boston University",
"University of Nebraska-Lincoln",
"Montana State University")
)
prof_course <- tibble(
professor = c("Bodwin", "Robinson", "Theobold", "Carlton"),
Stat_331 = c(TRUE, TRUE, TRUE, TRUE),
Stat_330 = c(FALSE, TRUE, FALSE, FALSE),
Stat_431 = c(TRUE, TRUE, TRUE, FALSE)
)
course_info <- data.frame(
course = c("Stat_331", "Stat_330", "Stat_431"),
num_sections = c(8, 3, 1)
)Reshaping and Joining Data Frames
In this lesson, you will learn:
The importance of nicely formatted code and how to adhere to style “guidelines”
How to reshape your dataset by pivoting longer or wider
How to use mutating joins and filtering joins to combine different datasets with related information.
▶️ Watch Videos: 25 minutes
📖 Readings: 30-60 minutes
💻 Activities: 45-60 minutes
✅ Check-ins: 3
1 Code Style
✅ Check-in: Code Style
You’ve been spending time and getting feedback on the formatting of your code. Based on what you’ve read, address the following questions.
Question 1: Based on the tidyverse style guide, where should whitespace (blank spaces) go?
Before and after every
+signBefore and after every
%>%Before and after every athematic operator (
+,-,*,/)After every comma
Before every comma
Question 2: How many characters per line should you strive to have in your R code (and RMarkdown files)?
Question 3: Based on the styleguide, which of the following are bad object names?
datacereals_datacereals.datacerealsDatacerealsdatacereals_data1cereals_data_clean
2 Reshaping Data
2.1 Tidy Data and Reshaping
📖 Required Reading: R4DS Chapter 6 (Data tidying)
▶️ Required Video: Data Layouts – 7 minutes
▶️ Required Video: Pivoting Longer – 9 minutes
▶️ Required Video: Pivoting Wider – 9 minutes
✅ Check-in 4.1: Pivoting
Download the cereal.csv data here.
Question 1: Create a new dataset called cereals_long, that has three columns:
The
nameof the cerealA column called
Nutrientwith values"protein","fat", or"fiber".A column called
Amountwith the corresponding amount of the nutrient.
3 Joining data
📖 Required Reading: R4DS Chapter 21 (Joins)
Because these chapters do a much better job visualizing the concepts, I’ve chosen not to record additional videos.
✅ Check-in: Practice with Joins
The following code creates three datasets / tibbles:
Here is what they look like once created:
prof_info
prof_info# A tibble: 4 × 3
professor undergrad_school grad_school
<chr> <chr> <chr>
1 Bodwin Harvard UNC
2 Glanz Cal Poly Boston University
3 Robinson Winona State University University of Nebraska-Lincoln
4 Theobold Colorado Mesa University Montana State University
prof_course # A tibble: 4 × 4
professor Stat_331 Stat_330 Stat_431
<chr> <lgl> <lgl> <lgl>
1 Bodwin TRUE FALSE TRUE
2 Robinson TRUE TRUE TRUE
3 Theobold TRUE FALSE TRUE
4 Carlton TRUE FALSE FALSE
course_info course num_sections
1 Stat_331 8
2 Stat_330 3
3 Stat_431 1
These datasets contain information about five Cal Poly professors, their educational history, the classes they are able to teach, and the number of sections of each class that need to be assigned.
a) Combine datasets 1 and 2 to make this dataset:
professor undergrad_school grad_school Stat_331 Stat_330 Stat_431
<chr> <chr> <chr> <lgl> <lgl> <lgl>
1 Bodwin Harvard UNC TRUE FALSE TRUE
2 Robinson Winona State University University of Nebraska-Lincoln TRUE TRUE TRUE
3 Theobold Colorado Mesa University Montana State University TRUE FALSE TRUE
b) Combine datasets 1 and 2 to make this dataset:
professor Stat_331 Stat_330 Stat_431 undergrad_school grad_school
<chr> <lgl> <lgl> <lgl> <chr> <chr>
1 Bodwin TRUE FALSE TRUE Harvard UNC
2 Robinson TRUE TRUE TRUE Winona State University University of Nebraska-Lincoln
3 Theobold TRUE FALSE TRUE Colorado Mesa University Montana State University
4 Carlton TRUE FALSE FALSE NA NA
c) Combine datasets 2 and 3 to make this dataset:
professor course can_teach num_sections
<chr> <chr> <lgl> <dbl>
1 Bodwin Stat_331 TRUE 8
2 Bodwin Stat_330 FALSE 3
3 Bodwin Stat_431 TRUE 1
4 Robinson Stat_331 TRUE 8
5 Robinson Stat_330 TRUE 3
6 Robinson Stat_431 TRUE 1
7 Theobold Stat_331 TRUE 8
8 Theobold Stat_330 FALSE 3
9 Theobold Stat_431 TRUE 1
10 Carlton Stat_331 TRUE 8
11 Carlton Stat_330 FALSE 3
12 Carlton Stat_431 FALSE 1